Experiments on Spam Detection with Boosting, Svm and Naive Bayes

نویسنده

  • Carl Liu
چکیده

For this project, I implement 3 popular text classification algorithms on spam detection, namely AdaBoost, Support Vector Machines and Naive Bayes. The performance are evaluated on some testing datasets. All experiments are done in Matlab. The experimental result is, all 3 algorithms have a satisfactory performance on spam detection. In term of accuracy, Adaboost has the best error bound. On the other hand Naive Bayes and SVM are superior on training and classification speed. In term of overall performance, Naive Bayes is an ideal algorithm for spam detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment Based Twitter Spam Detection

Spams are becoming a serious threat for the users of online social networks especially for the ones like of twitter. twitter’s structural features make it more volatile to spam attacks. In this paper, we propose a spam detection approach for twitter based on sentimental features. We perform our experiments on a data collection of 29K tweets with 1K tweets for 29 trending topics of 2012 on twitt...

متن کامل

Ensemble of SVM Classifiers for Spam Filtering

Unsolicited commercial email also known as Spam is becoming a serious problem for Internet users and providers (Fawcett, 2003). Several researchers have applied machine learning techniques in order to improve the detection of spam messages. Naive Bayes models are the most popular (Androutsopoulos, 2000) but other authors have applied Support Vector Machines (SVM) (Drucker, 1999), boosting and d...

متن کامل

Survey on Text Classification (Spam) Using Machine Learning

E-mail spam is a very serious problem in today’s life. It has many conséquences like it causes lower productivity, occupy space in mail boxes, extend viruses, Trojans, and materials containing potentially harmful information for a certain category of users, Destroy stability of mail servers, and as a result users spend a lot of time for sorting incoming mail and deleting undesirable corresponde...

متن کامل

Boosting Trees for Anti-Spam Email Filtering

This paper describes a set of comparative experiments for the problem of automatically filtering unwanted electronic mail messages. Several variants of the AdaBoost algorithm with confidence– rated predictions (Schapire & Singer 99) have been applied, which differ in the complexity of the base learners considered. Two main conclusions can be drawn from our experiments: a) The boosting–based met...

متن کامل

A new feature selection algorithm based on binomial hypothesis testing for spam filtering

Content-based spam filtering is a binary text categorization problem. To improve the performance of the spam filtering, feature selection, as an important and indispensable means of text categorization, also plays an important role in spam filtering. We proposed a new method, named Bi-Test, which utilizes binomial hypothesis testing to estimate whether the probability of a feature belonging to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008